Graph size multi shard support - changes in JdbcIOWrapper#3706
Graph size multi shard support - changes in JdbcIOWrapper#3706VardhanThigle wants to merge 2 commits intoGoogleCloudPlatform:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces support for multi-shard database configurations in the JdbcIoWrapper. It enables parallel schema discovery across multiple shards to improve startup performance and updates the internal identification mechanism to use table schema UUIDs, ensuring more robust tracking of table reads and completions. The changes also include necessary refactoring to aggregate reader transforms across these shards. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3706 +/- ##
============================================
+ Coverage 52.49% 58.76% +6.27%
+ Complexity 6248 2138 -4110
============================================
Files 1065 505 -560
Lines 64318 29278 -35040
Branches 7119 3209 -3910
============================================
- Hits 33765 17206 -16559
+ Misses 28268 11095 -17173
+ Partials 2285 977 -1308
🚀 New features to boost your workflow:
|
62552ce to
b940a2c
Compare
JDBC Engine Support for Multi-Source Readers
This is the fourth child of #3684 . Here we refactor JdbcIO to allow multi sharded read. This will be followed by changes in Pipeline Controller and ReadWithUniformPartitions Changes. Please look at #3684 for all the details.
Design Decision
This PR implements the core multi-shard logic within the
JdbcIoWrapper.Key Changes:
JdbcIoWrapperConfigGroupin parallel usingparallelStream().ReadWithUniformPartitionsinstance.Rationale:
To support thousands of tables across hundreds of shards, sequential schema discovery was no longer feasible. By parallelizing this phase and consolidating the resulting transforms, we achieve both rapid job startup and a constant-sized Dataflow graph.
Why it's Safe (Concurrency & Error Isolation)
FluentBackoff), ensuring that transient network issues don't trigger unnecessary job failures.How to Verify
The added tests simulate multi-shard configurations and verify that parallel discovery produces the correct aggregated schema and a consolidated reader transform.